Search CORE

32 research outputs found

Relational Playground: Teaching the Duality of Relational Algebra and SQL

Author: Mior Michael
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 23/06/2023
Field of study

Students in introductory data management courses are often taught how to write queries in SQL. This is a useful and practical skill, but it gives limited insight into how queries are processed by relational database engines. In contrast, relational algebra is a commonly used internal representation of queries by database engines, but can be challenging for students to grasp. We developed a tool we call Relational Playground for database students to explore the connection between relational algebra and SQL

arXiv.org e-Print Archive

JSONoid: Monoid-based Enrichment for Configurable and Scalable Data-Driven Schema Discovery

Author: Mior Michael J.
Publication venue
Publication date: 06/07/2023
Field of study

Schema discovery is an important aspect to working with data in formats such as JSON. Unlike relational databases, JSON data sets often do not have associated structural information. Consumers of such datasets are often left to browse through data in an attempt to observe commonalities in structure across documents to construct suitable code for data processing. However, this process is time-consuming and error-prone. Existing distributed approaches to mining schemas present a significant usability advantage as they provide useful metadata for large data sources. However, depending on the data source, ad hoc queries for estimating other properties to help with crafting an efficient data pipeline can be expensive. We propose JSONoid, a distributed schema discovery process augmented with additional metadata in the form of monoid data structures that are easily maintainable in a distributed setting. JSONoid subsumes several existing approaches to distributed schema discovery with similar performance. Our approach also adds significant useful additional information about data values to discovered schemas with linear scalability

arXiv.org e-Print Archive

Comprehending Semantic Types in JSON Data with Graph Neural Networks

Author: Mior Michael J.
Wei Shuang
Publication venue
Publication date: 24/07/2023
Field of study

Semantic types are a more powerful and detailed way of describing data than atomic types such as strings or integers. They establish connections between columns and concepts from the real world, providing more nuanced and fine-grained information that can be useful for tasks such as automated data cleaning, schema matching, and data discovery. Existing deep learning models trained on large text corpora have been successful at performing single-column semantic type prediction for relational data. However, in this work, we propose an extension of the semantic type prediction problem to JSON data, labeling the types based on JSON Paths. Similar to columns in relational data, JSON Path is a query language that enables the navigation of complex JSON data structures by specifying the location and content of the elements. We use a graph neural network to comprehend the structural information within collections of JSON documents. Our model outperforms a state-of-the-art existing model in several cases. These results demonstrate the ability of our model to understand complex JSON data and its potential usage for JSON-related data processing tasks

arXiv.org e-Print Archive

Apache Calcite: A Foundational Framework for Optimized Query Processing Over Heterogeneous Data Sources

Author: Begoli Edmon
Hyde Julian
Lemire Daniel
Mior Michael J.
Rodríguez Jesús Camacho
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/02/2018
Field of study

Apache Calcite is a foundational software framework that provides query processing, optimization, and query language support to many popular open-source data processing systems such as Apache Hive, Apache Storm, Apache Flink, Druid, and MapD. Calcite's architecture consists of a modular and extensible query optimizer with hundreds of built-in optimization rules, a query processor capable of processing a variety of query languages, an adapter architecture designed for extensibility, and support for heterogeneous data models and stores (relational, semi-structured, streaming, and geospatial). This flexible, embeddable, and extensible architecture is what makes Calcite an attractive choice for adoption in big-data frameworks. It is an active project that continues to introduce support for the new types of data sources, query languages, and approaches to query processing and optimization.Comment: SIGMOD'1

arXiv.org e-Print Archive

R-libre

Physical Design for Non-relational Data Systems

Author: Mior Michael
Publication venue: 'University of Waterloo'
Publication date: 03/08/2018
Field of study

Decades of research have gone into the optimization of physical designs, query execution, and related tools for relational databases. These techniques and tools make it possible for non-expert users to make effective use of relational database management systems. However, the drive for flexible data models and increased scalability has spawned a new generation of data management systems which largely eschew the relational model. These include systems such as NoSQL databases and distributed analytics frameworks such as Apache Spark which make use of a diverse set of data models. Optimization techniques and tools developed for relational data do not directly apply in this setting. This leaves developers making use of these systems with the need to become intimately familiar with system details to obtain good performance. We present techniques and tools for physical design for non-relational data systems. We explore two settings: NoSQL database systems and distributed analytics frameworks. While NoSQL databases often avoid explicit schema definitions, many choices on how to structure data remain. These choices can have a significant impact on application performance. The data structuring process normally requires expert knowledge of the underlying database. We present the NoSQL Schema Evaluator (NoSE). Given a target workload, NoSE provides an optimized physical design for NoSQL database applications which compares favourably to schemas designed by expert users. To enable existing applications to benefit from conceptual modeling, we also present an algorithm to recover a logical model from a denormalized database instance. Our second setting is distributed analytics frameworks such as Apache Spark. As is the case for NoSQL databases, expert knowledge of Spark is often required to construct efficient data pipelines. In NoSQL systems, a key challenge is how to structure stored data, while in Spark, a key challenge is how to cache intermediate results. We examine a particularly common scenario in Spark which involves performing iterative analysis on an input dataset. We show that jobs written in an intuitive manner using existing Spark APIs can have poor performance. We propose ReSpark, which automates caching decisions for iterative Spark analyses. Like NoSE, ReSpark makes it possible for non-expert users to obtain good performance from a non-relational data system

University of Waterloo's Institutional Repository

NoSQL Schema Design for Time-Dependent Workloads

Author: Mior Michael
Onizuka Makoto
Sasaki Yuya
Wakuta Yusuke
Zenmyo Teruyoshi
Publication venue
Publication date: 29/03/2023
Field of study

In this paper, we propose a schema optimization method for time-dependent workloads for NoSQL databases. In our proposed method, we migrate schema according to changing workloads, and the estimated cost of execution and migration are formulated and minimized as a single integer linear programming problem. Furthermore, we propose a method to reduce the number of optimization candidates by iterating over the time dimension abstraction and optimizing the workload while updating constraints

arXiv.org e-Print Archive

The attrition rate of licensed chiropractors in California: an exploratory ecological investigation of time-trend data

Author: D Lawrence
EL Hurwitz
H Morgenstern
H Ni
KJ Sherman
M Davis
Michael J Stahl
National Board of Chiropractic Examiners
P Barnes
P Cote
PG Shekelle
S Mior
S Seater
Stephen M Foreman
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background The authors hypothesized the attrition rate of licensed chiropractors in California has gradually increased over the past several decades. "Attrition" as determined for this study is defined as a loss of legal authority to practice chiropractic for any reason during the first 10 years after the license was issued. The percentage of license attrition after 10 years was determined for each group of graduates licensed in California each year between 1970 and 1998. The cost of tuition, the increase in the supply of licensed chiropractors and the ratio of licensed chiropractors to California residents were examined as possible influences on the rate of license attrition. Methods The attrition rate was determined by a retrospective analysis of license status data obtained from the California Department of Consumer Affairs. Other variables were determined from US Bureau of Census data, survey data from the American Chiropractic Association and catalogs from a US chiropractic college. Results The 10-year attrition rate rose from 10% for those graduates licensed in 1970 to a peak of 27.8% in 1991. The 10-year attrition rate has since remained between 20-25% for the doctors licensed between 1992-1998. Conclusions Available evidence supports the hypothesis that the attrition rate for licensed chiropractors in the first 10 years of practice has risen in the past several decades.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

A united statement of the global chiropractic research community against the pseudoscientific claim that chiropractic care boosts immunity.

Author: Aillet Luc
Ammendolia Carlo
Arnbak Bodil
Axen Iben
Baechler Mirjam
Barbier Gaëtan
Barbier-Cazorla Florian
Bergstrøm Cecilia
Beynon Amber
Blanchette Marc André
Bolton Philip S.
Breen Alan
Brinch Johanne
Bronfort Gert
Brown Benjamin
Bruno Paul
Burrell Christopher
Busse Jason W.
Bussières André
Byfield David
Campello Marco
Cancelliere Carol
Carroll Linda
Cassidy J. David
Cedraschi Christine
Chow Ngai
Christensen Henrik Wulff
Chéron Charlène
Claussen Stine
Corso Melissa
Côté Pierre
Dan Wang Andrew L.
Davis Matthew A.
De Carvalho Diana
De Luca Katie
De Zoete Annemarie
Demortier Marine
Doktor Klaus
Downie Aron
Du Rose Alister
Eklund Andreas
Engel Roger
Erwin Mark
Eubanks James E.
Evans Roni
Evans Will
Fernandez Matthew
Field Jonathan
Fournier Gilles
French Simon
Fuglkjaer Signe
Gagey Olivier
Giuriato Rosemary
Gliedt Jordan A.
Goertz Christine
Goncalves Guillaume
Grondin Diane
Gurden Mark
Haas Mitchell
Haldeman Scott
Harsted Steen
Hartvigsen Jan
Hartvigsen Lisbeth
Hayden Jill
Hesby Bue
Hestbæk Lise
Hincapié Cesar
Hogg-Johnson Sheilah
Hondras Maria A.
Honoré Margaux
Howarth Samuel
Hébert Jeffrey J.
Injeyan H. Stephen
Innes Stanley
Irgens Pernille Marie
Jacobs Craig
Jenkins Hazel
Jenks Alan
Jensen Tue Secher
Johhansson Melker
Kawchuk Greg N.
Kongsted Alice
Konner Mikkel Brunsgaard
Kopansky-Giles Deborah
Kryger Rikke
Lardon Arnaud
Lauridsen Henrik Hein
Le Scanff Christine
Leboeuf-Yde Charlotte
Leininger Brent
Lemeunier Nadège
Lewis Eugene A.
Linaker Kathleen
Lothe Lise
Marchand Andrée Anne
McNaughton David
Meyer Anne Laure
Miller Peter
Mior Silvano
Moore Craig
Murphy Donald R.
Myburgh Corrie
Myhrvold Birgitte
Mølgaard Anne
Newell Dave
Newton Genevieve
Nim Casper
Nordin Margareta
Nyiro Luana
O'Neill Søren
Pagé Isabelle
Pasquier Mégane
Penza Charles W.
Perle Stephen M.
Picchiottino Mathieu
Piché Mathieu
Poulsen Erik
Quon Jeffrey
Raven Tim
Rezai Mana
Roseen Eric J.
Rubinstein Sidney
Salmi Louis Rachid
Schneider Michael
Schweinhardt Petra
Shearer Heather M.
Sirucek Laura
Sorondo Delphine
Srbely John
Stern Paula J.
Stevans Joel
Stochkendahl Mette Jensen
Stuber Kent
Stupar Maja
Swain Michael
Teodorczyk-Injeyan Julita
Thiel Haymo
Théroux Jean
Uhrenholt Lars
Verbeek Anneke
Verville Leslie
Vincent Karl
Weber Kenneth A.
Whedon James M.
Wong Jessica
Wuytack Francesca
Young James
Yu Hainan
Ziegler Dorte
Øverås Cecilie
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

BACKGROUND: In the midst of the coronavirus pandemic, the International Chiropractors Association (ICA) posted reports claiming that chiropractic care can impact the immune system. These claims clash with recommendations from the World Health Organization and World Federation of Chiropractic. We discuss the scientific validity of the claims made in these ICA reports. MAIN BODY: We reviewed the two reports posted by the ICA on their website on March 20 and March 28, 2020. We explored the method used to develop the claim that chiropractic adjustments impact the immune system and discuss the scientific merit of that claim. We provide a response to the ICA reports and explain why this claim lacks scientific credibility and is dangerous to the public. More than 150 researchers from 11 countries reviewed and endorsed our response. CONCLUSION: In their reports, the ICA provided no valid clinical scientific evidence that chiropractic care can impact the immune system. We call on regulatory authorities and professional leaders to take robust political and regulatory action against those claiming that chiropractic adjustments have a clinical impact on the immune system

University of Toronto Research Repository

VU Research Portal

Research Repository

Bournemouth University Research Online

Dépôt numérique de UQTR

aCQUIRe

Tropical Data: Approach and Methodology as Applied to Trachoma Prevalence Surveys

Author: Abdala Mariamo
Abdi Hirpa M
Abdou Amza
Adamu Yilikal
Al-Khatib Tawfik
Alemayehu Addisu
Alemayehu Wondu
Apadinuwe Sue-Chen
Awaca Naomie
Awoussi Marcel S
Baayendag Gilbert
Badiane Mouctar Dieng
Bailey Robin L
Bakhtiari Ana
Batcho Wilfrid
Bay Zulficar
Beido Nassirou
Bejiga Michael Dejene
Bella Assumpta
Bol Yak Yak
Bougouma Clarisse
Boyd Sarah
Brady Christopher J
Bucumi Victor
Burgert-Brucker Clara R
Butcher Robert
Cakacaka Risiate
Cama Anaseini
Camara Mamoudou
Cassama Eunice
Chaora Shorai Grace
Chebbi Amel Chenaoui
Chisambi Alvin Blessings
Chu Brian
Conteh Abdulai
Coulibaly Sidi Mohamed
Courtright Paul
Dalmar Abdi
Dat Tran Minh
Davids Thully
de Fátima Costa Lopes Maria
Djaker Mohamed El Amine
Dodson Sarity
Downs Philip
Dézoumbé Djore
Eckman Stephanie
Elmezoghi Mourad
Elshafie Bilghis Elkhair
Elvis Ange Aba
Emerson Paul
Epée Emilienne Ee
Faktaufon Daniel
Fall Mawo
Fassinou Aréty
Fleming Fiona
Flueckiger Rebecca
Gamael Koizan Kadjo
Garae Mackline
Garap Jambi
Gass Katie
Gebru Genet
Gichangi Michael M
Giorgi Emanuele
Gower Emily W
Goépogui André
Gómez Forero Diana Paola
Gómez Daniela Vaz Ferreira
Harding-Esch Emma M
Harte Anna
Henry Rob
Honorio-Morales Harvy Alberto
Hooper Pamela J
Ilako Dunera R
Issifou Amadou Alfa Bio
Jimenez Cristina
Jones Ellen
Kabona George
Kabore Martin
Kadri Boubacar
Kalua Khumbo
Kanyi Sarjo Kebba
Kebede Fikreab
Kebede Shambel
Keenan Jeremy D
Kello Amir B
Khan Asad Aslam
Khelifi Houria
Kilangalanga Janvier
Kim Sung Hye
Ko Robert
Lewallen Susan
Lietman Thomas
Logora Makoy Samuel Yibi
Lopez Yuri A
M'Po Nerkoua
MacArthur Chad
Macleod Colin
Makangila Felix
Mariko Brehima
Martin Diana L
Masika Michael
Massae Patrick
Massangaie Marilia
Matendechero Hadley S
Mathewos Tsedeke
McCullagh Siobhain
Meite Aboulaye
Mendes Elsa Palma
Millar Tom
Miller Hollman
Minnih Abdellahi
Mishra Sailesh Kumar
Molefi Tuduetso
Mosher Aryc
Mpyet Caleb
Mugume Francis
Mukwiza Robson
Mwale Consity
Mwatha Stephen
Mwingira Upendo
Nash Scott D
Nassa Christophe
Negussu Nebiyu
Ngondi Jeremiah
Nieba Cece
Noah Noah Jean Claude
Nwosu Christian O
Olobio Nicholas
Opon Rapheal
Pavluck Alexandre
Phiri Isaac
Rainima-Qaniuci Merelesita
Renneker Kristen K
Rotondo Lisa
Saboyá-Díaz Martha Idalí
Sakho Fatoumata
Sanha Salimato
Sarah Virginia
Sarr Boubacar
Seife Fikre
Serrano Chavez Gloria Marina
Shah Salam Ahmad
Sharma Shekhar
Sissoko Mactar
Sitoe Henis Mior
Sokana Oliver
Solomon Anthony W
Szwarcwald Celia L
Tadesse Fentahun
Taleo Fasiah
Talero Sandra Liliana
Tarfani Youcef
Tefera Amsayaw
Tekeraoi Rabebe
Tesfazion Andeberhan
Traina Abubaker
Traoré Lamine
Trujillo-Trujillo Julián
Tukahebwa Edridah M
Vashist Praveen
Wanyama Ernest B
Warusavithana Supriya DP
Watitu Titus K
West Sheila
Willis Rebecca
Win Ye
Woods Geordie
Yajima Aya
Yaya Georges
Zecarias Alem
Zewengiel Solomon
Zoumanigui Akoi
Publication venue: Taylor and Francis Group
Publication date: 12/12/2023
Field of study

PURPOSE: Population-based prevalence surveys are essential for decision-making on interventions to achieve trachoma elimination as a public health problem. This paper outlines the methodologies of Tropical Data, which supports work to undertake those surveys. METHODS: Tropical Data is a consortium of partners that supports health ministries worldwide to conduct globally standardised prevalence surveys that conform to World Health Organization recommendations. Founding principles are health ministry ownership, partnership and collaboration, and quality assurance and quality control at every step of the survey process. Support covers survey planning, survey design, training, electronic data collection and fieldwork, and data management, analysis and dissemination. Methods are adapted to meet local context and needs. Customisations, operational research and integration of other diseases into routine trachoma surveys have also been supported. RESULTS: Between 29th February 2016 and 24th April 2023, 3373 trachoma surveys across 50 countries have been supported, resulting in 10,818,502 people being examined for trachoma. CONCLUSION: This health ministry-led, standardised approach, with support from the start to the end of the survey process, has helped all trachoma elimination stakeholders to know where interventions are needed, where interventions can be stopped, and when elimination as a public health problem has been achieved. Flexibility to meet specific country contexts, adaptation to changes in global guidance and adjustments in response to user feedback have facilitated innovation in evidence-based methodologies, and supported health ministries to strive for global disease control targets

LSHTM Research Online